0

Superstore Data Analysis.ipynb

  1. Superstore Sales Analysis
  2. Project Overview
  3. Objective
  4. Dataset Info
  5. **Key Questions to Answer**
  6. Importing Libraries & Loading Data
  7. Data Cleaning & Preparation
  8. Sales and Profit Over Time
  9. Insights
  10. Business Implication
    1. Insights
  11. Business Implication
  12. Sales and Profit by Category & Sub-Category
  13. Insights
  14. Business Implication
  15. Sales versus profit
  16. Insights
  17. Business Recommendations:
  18. Which states and regions generate the highest and lowest profit?
  19. 📈 Profit Analysis by State and Region
    1. 🔍 Insights
      1. ✅ High-Profit Areas
      2. ❌ Low/Negative-Profit Areas
    2. 💡 Business Recommendations
      1. 1. 🚀 Focus on High-Profit States
      2. 2. 🔎 Investigate Loss-Making States
      3. 3. 📍 Implement Regional Strategies
      4. 4. ⚙️ Optimize Operations in Central and South
    3. 5. 📊 Invest in Data-Driven Tools
    4. Which customer segments are the most valuable?
  20. ❓ Question: Which Customer Segments Are the Most Valuable?
    1. ✅ Answer:
      1. 🥇 Most Valuable Segment:
      2. 🥈 Second Most Valuable:
      3. 🥉 Least Valuable:
    2. 📌 Business Implication:
    3. Top 10 Customers by Sales
  21. 💼 Business Recommendations for Top 10 Customers by Sales
    1. 🎯 Objective:
    2. 🔝 Recommendations:
    3. 📈 Resulting Benefits:
    4. Sales by Region and Segment
  22. Sales by Region and Segment
    1. 🔍 Insights:
    2. ✅ Business Recommendations:
  23. Summary:
Skip to Main
Jupyter

Superstore Data Analysis

Last Checkpoint: 22 hours ago
  • File
  • Edit
  • View
  • Run
  • Kernel
  • Settings
  • Help
JupyterLab
Python 3 (ipykernel)
Kernel status: Idle
image/svg+xml
    ## Superstore Sales Analysis

    Superstore Sales Analysis¶

    ## Project Overview

    This project analyzes the **Superstore** dataset to uncover sales patterns, profit distribution, customer behaviour and regional performance.
    Using Python and Plotly Express, I created interactive visualizations to identify key business insights and suggest data-driven recommendations.

    Project Overview¶

    This project analyzes the Superstore dataset to uncover sales patterns, profit distribution, customer behaviour and regional performance. Using Python and Plotly Express, I created interactive visualizations to identify key business insights and suggest data-driven recommendations.

    ## Objective

    To explore the Superstore dataset and answer critical business questions to help improve decision-making in areas of sales, profit, shipping and customer segmentation.

    Objective¶

    To explore the Superstore dataset and answer critical business questions to help improve decision-making in areas of sales, profit, shipping and customer segmentation.

    ## Dataset Info

    - Source: Sample Superstore dataset from Kaggle
    - Fields: Order Date, Ship Date, Sales, Profit, Category,Product Name, Customer Name, Segment etc.

    Dataset Info¶

    • Source: Sample Superstore dataset from Kaggle
    • Fields: Order Date, Ship Date, Sales, Profit, Category,Product Name, Customer Name, Segment etc.
    ## **Key Questions to Answer**

    1. What are the overall sales and profit trends over time?

    2. Which product categories and sub-categories are the most/least profitable?

    3. Which states and regions generate the highest and lowest profit?

    4. Which customer segments are the most valuable?

    5. Top 10 Customers by Sales

    6. Sales by Region and Segment

    Key Questions to Answer¶

    1. What are the overall sales and profit trends over time?

    2. Which product categories and sub-categories are the most/least profitable?

    3. Which states and regions generate the highest and lowest profit?

    4. Which customer segments are the most valuable?

    5. Top 10 Customers by Sales

    6. Sales by Region and Segment

    ## Importing Libraries & Loading Data

    Importing Libraries & Loading Data¶

    [3]:
    import pandas as pd
    import plotly.express as px

    # load dataset
    df = pd.read_csv("C:/Users/DELL/Downloads/superstore_dataset.csv")

    #checking the first few rows

    df.head()
    [3]:
    order_id order_date ship_date customer manufactory product_name segment category subcategory region zip city state country discount profit quantity sales profit_margin
    0 US-2020-103800 1/3/2019 1/7/2019 Darren Powers Message Book Message Book, Wirebound, Four 5 1/2" X 4" Form... Consumer Office Supplies Paper Central 77095 Houston Texas United States 0.2 5.5512 2 16.448 0.3375
    1 US-2020-112326 1/4/2019 1/8/2019 Phillina Ober GBC GBC Standard Plastic Binding Systems Combs Home Office Office Supplies Binders Central 60540 Naperville Illinois United States 0.8 -5.4870 2 3.540 -1.5500
    2 US-2020-112326 1/4/2019 1/8/2019 Phillina Ober Avery Avery 508 Home Office Office Supplies Labels Central 60540 Naperville Illinois United States 0.2 4.2717 3 11.784 0.3625
    3 US-2020-112326 1/4/2019 1/8/2019 Phillina Ober SAFCO SAFCO Boltless Steel Shelving Home Office Office Supplies Storage Central 60540 Naperville Illinois United States 0.2 -64.7748 3 272.736 -0.2375
    4 US-2020-141817 1/5/2019 1/12/2019 Mick Brown Avery Avery Hi-Liter EverBold Pen Style Fluorescent ... Consumer Office Supplies Art East 19143 Philadelphia Pennsylvania United States 0.2 4.8840 3 19.536 0.2500
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 9994 entries, 0 to 9993
    Data columns (total 19 columns):
     #   Column         Non-Null Count  Dtype  
    ---  ------         --------------  -----  
     0   order_id       9994 non-null   object 
     1   order_date     9994 non-null   object 
     2   ship_date      9994 non-null   object 
     3   customer       9994 non-null   object 
     4   manufactory    9994 non-null   object 
     5   product_name   9994 non-null   object 
     6   segment        9994 non-null   object 
     7   category       9994 non-null   object 
     8   subcategory    9994 non-null   object 
     9   region         9994 non-null   object 
     10  zip            9994 non-null   int64  
     11  city           9994 non-null   object 
     12  state          9994 non-null   object 
     13  country        9994 non-null   object 
     14  discount       9994 non-null   float64
     15  profit         9994 non-null   float64
     16  quantity       9994 non-null   int64  
     17  sales          9994 non-null   float64
     18  profit_margin  9994 non-null   float64
    dtypes: float64(4), int64(2), object(13)
    memory usage: 1.4+ MB
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 9994 entries, 0 to 9993
    Data columns (total 20 columns):
     #   Column         Non-Null Count  Dtype         
    ---  ------         --------------  -----         
     0   order_id       9994 non-null   object        
     1   order_date     9994 non-null   datetime64[ns]
     2   ship_date      9994 non-null   datetime64[ns]
     3   customer       9994 non-null   object        
     4   manufactory    9994 non-null   object        
     5   product_name   9994 non-null   object        
     6   segment        9994 non-null   object        
     7   category       9994 non-null   object        
     8   subcategory    9994 non-null   object        
     9   region         9994 non-null   object        
     10  zip            9994 non-null   int64         
     11  city           9994 non-null   object        
     12  state          9994 non-null   object        
     13  country        9994 non-null   object        
     14  discount       9994 non-null   float64       
     15  profit         9994 non-null   float64       
     16  quantity       9994 non-null   int64         
     17  sales          9994 non-null   float64       
     18  profit_margin  9994 non-null   float64       
     19  Delivery Time  9994 non-null   int64         
    dtypes: datetime64[ns](2), float64(4), int64(3), object(11)
    memory usage: 1.5+ MB
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    Cell In[9], line 10
          6 df.isnull().sum()
          8 #dropping the Order Date and Ship Date column
    ---> 10 df = df.drop(['Order Date', 'Ship Date'], axis = 1)
    
    File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:5581, in DataFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
       5433 def drop(
       5434     self,
       5435     labels: IndexLabel | None = None,
       (...)
       5442     errors: IgnoreRaise = "raise",
       5443 ) -> DataFrame | None:
       5444     """
       5445     Drop specified labels from rows or columns.
       5446 
       (...)
       5579             weight  1.0     0.8
       5580     """
    -> 5581     return super().drop(
       5582         labels=labels,
       5583         axis=axis,
       5584         index=index,
       5585         columns=columns,
       5586         level=level,
       5587         inplace=inplace,
       5588         errors=errors,
       5589     )
    
    File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4788, in NDFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
       4786 for axis, labels in axes.items():
       4787     if labels is not None:
    -> 4788         obj = obj._drop_axis(labels, axis, level=level, errors=errors)
       4790 if inplace:
       4791     self._update_inplace(obj)
    
    File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4830, in NDFrame._drop_axis(self, labels, axis, level, errors, only_slice)
       4828         new_axis = axis.drop(labels, level=level, errors=errors)
       4829     else:
    -> 4830         new_axis = axis.drop(labels, errors=errors)
       4831     indexer = axis.get_indexer(new_axis)
       4833 # Case for non-unique axis
       4834 else:
    
    File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:7070, in Index.drop(self, labels, errors)
       7068 if mask.any():
       7069     if errors != "ignore":
    -> 7070         raise KeyError(f"{labels[mask].tolist()} not found in axis")
       7071     indexer = indexer[~mask]
       7072 return self.delete(indexer)
    
    KeyError: "['Order Date', 'Ship Date'] not found in axis"
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 9994 entries, 0 to 9993
    Data columns (total 20 columns):
     #   Column         Non-Null Count  Dtype         
    ---  ------         --------------  -----         
     0   order_id       9994 non-null   object        
     1   order_date     9994 non-null   datetime64[ns]
     2   ship_date      9994 non-null   datetime64[ns]
     3   customer       9994 non-null   object        
     4   manufactory    9994 non-null   object        
     5   product_name   9994 non-null   object        
     6   segment        9994 non-null   object        
     7   category       9994 non-null   object        
     8   subcategory    9994 non-null   object        
     9   region         9994 non-null   object        
     10  zip            9994 non-null   int64         
     11  city           9994 non-null   object        
     12  state          9994 non-null   object        
     13  country        9994 non-null   object        
     14  discount       9994 non-null   float64       
     15  profit         9994 non-null   float64       
     16  quantity       9994 non-null   int64         
     17  sales          9994 non-null   float64       
     18  profit_margin  9994 non-null   float64       
     19  Delivery Time  9994 non-null   int64         
    dtypes: datetime64[ns](2), float64(4), int64(3), object(11)
    memory usage: 1.5+ MB
    
    Jul 2019Jan 2020Jul 2020Jan 2021Jul 2021Jan 2022Jul 2022050k100k
    Monthly Sales Trendorder_datesales
    plotly-logomark
    Jul 2019Jan 2020Jul 2020Jan 2021Jul 2021Jan 2022Jul 202205k10k15k
    Monthly Profit Trendorder_dateprofit
    plotly-logomark
    05k10k15k20k−500005000
    salesprofit
    plotly-logomark
    AlabamaArkansasFloridaGeorgiaKentuckyLouisianaMississippiNorth CarolinaSouth CarolinaTennesseeVirginiaArizonaCaliforniaColoradoIdahoMontanaNevadaNew MexicoOregonUtahWashingtonWyomingConnecticutDelawareDistrict of ColumbiaMaineMarylandMassachusettsNew HampshireNew JerseyNew YorkOhioPennsylvaniaRhode IslandVermontWest VirginiaIllinoisIndianaIowaKansasMichiganMinnesotaMissouriNebraskaNorth DakotaOklahomaSouth DakotaTexasWisconsin050k
    regionSouthWestEastCentralstateprofit
    plotly-logomark
    ConsumerCorporateHome Office050k100k
    segmentConsumerCorporateHome Officesegmentprofit
    plotly-logomark
    Top 10 Customers by Sales:
                 customer      sales
    0         Sean Miller  25043.050
    1        Tamara Chand  19052.218
    2        Raymond Buch  15117.339
    3        Tom Ashbrook  14595.620
    4       Adrian Barton  14473.571
    5        Ken Lonsdale  14175.229
    6        Sanjit Chand  14142.334
    7        Hunter Lopez  12873.298
    8        Sanjit Engle  12209.438
    9  Christopher Conant  12129.072
    
    CentralEastSouthWest0200k400k600k
    segmentConsumerCorporateHome Officeregionsales
    plotly-logomark
    Common Tools
    No metadata.
    Advanced Tools
    No metadata.
    Anaconda Assistant
    AI-powered coding, insights and debugging in your notebooks.
    To enable the following extensions, create an account or sign in.
    • Anaconda Assistant
      4.0.15
    • Coming soon!
    • Data Catalogs
    • Panel Deployments
    • Sharing
    Already have an account? Sign In
    For more information, read our Anaconda Assistant documentation.
    Alt+[
    Alt+]
    Alt+End
    • Assistant
    • Open Anaconda Assistant
    • Console
    • Change Kernel…
    • Clear Console Cells
    • Close and Shut Down…
    • Insert Line Break
    • Interrupt Kernel
    • New Console
    • Restart Kernel…
    • Run Cell (forced)
    • Run Cell (unforced)
    • Show All Kernel Activity
    • Display Languages
    • English
      English
    • File Operations
    • Autosave Documents
    • Download
      Download the file to your computer
    • Reload Notebook from Disk
      Reload contents from disk
    • Revert Notebook to Checkpoint…
      Revert contents to previous checkpoint
    • Save Notebook
      Save and create checkpoint
      Ctrl+S
    • Save Notebook As…
      Save with new path
      Ctrl+Shift+S
    • Trust HTML File
      Whether the HTML file is trusted. Trusting the file allows scripts to run in it, which may result in security risks. Only enable for files you trust.
    • Help
    • About Jupyter Notebook
    • Jupyter Reference
    • JupyterLab FAQ
    • JupyterLab Reference
    • Launch Jupyter Notebook File Browser
    • Markdown Reference
    • Show Keyboard Shortcuts…
      Show relevant keyboard shortcuts for the current active widget
      Ctrl+Shift+H
    • Image Viewer
    • Flip image horizontally
      H
    • Flip image vertically
      V
    • Invert Colors
      I
    • Reset Image
      0
    • Rotate Clockwise
      ]
    • Rotate Counterclockwise
      [
    • Zoom In
      =
    • Zoom Out
      -
    • Kernel Operations
    • Shut Down All Kernels…
    • Main Area
    • Close All Other Tabs
    • Close Tab
      Alt+W
    • Close Tabs to Right
    • End Search
      Esc
    • Find Next
      Ctrl+G
    • Find Previous
      Ctrl+Shift+G
    • Find…
      Ctrl+F
    • Log Out
      Log out of Jupyter Notebook
    • Search in Selection
      Alt+L
    • Shut Down
      Shut down Jupyter Notebook
    • Mode
    • Toggle Zen Mode
    • Notebook Cell Operations
    • Change to Code Cell Type
      Y
    • Change to Heading 1
      1
    • Change to Heading 2
      2
    • Change to Heading 3
      3
    • Change to Heading 4
      4
    • Change to Heading 5
      5
    • Change to Heading 6
      6
    • Change to Markdown Cell Type
      M
    • Change to Raw Cell Type
      R
    • Clear Cell Output
      Clear outputs for the selected cells
    • Collapse All Code
    • Collapse All Outputs
    • Collapse Selected Code
    • Collapse Selected Outputs
    • Copy Cell
      Copy this cell
      C
    • Cut Cell
      Cut this cell
      X
    • Delete Cell
      Delete this cell
      D, D
    • Disable Scrolling for Outputs
    • Enable Scrolling for Outputs
    • Expand All Code
    • Expand All Outputs
    • Expand Selected Code
    • Expand Selected Outputs
    • Extend Selection Above
      Shift+K
    • Extend Selection Below
      Shift+J
    • Extend Selection to Bottom
      Shift+End
    • Extend Selection to Top
      Shift+Home
    • Insert Cell Above
      Insert a cell above
      A
    • Insert Cell Below
      Insert a cell below
      B
    • Insert Heading Above Current Heading
      Shift+A
    • Insert Heading Below Current Heading
      Shift+B
    • Merge Cell Above
      Ctrl+Backspace
    • Merge Cell Below
      Ctrl+Shift+M
    • Merge Selected Cells
      Shift+M
    • Move Cell Down
      Move this cell down
      Ctrl+Shift+Down
    • Move Cell Up
      Move this cell up
      Ctrl+Shift+Up
    • Paste Cell Above
      Paste this cell from the clipboard
    • Paste Cell and Replace
    • Paste Cell Below
      Paste this cell from the clipboard
      V
    • Redo Cell Operation
      Shift+Z
    • Render Side-by-Side
      Shift+R
    • Run Selected Cell
      Run this cell and advance
      Shift+Enter
    • Run Selected Cell and Do not Advance
      Ctrl+Enter
    • Run Selected Cell and Insert Below
      Alt+Enter
    • Run Selected Text or Current Line in Console
    • Select Cell Above
      K
    • Select Cell Below
      J
    • Select Heading Above or Collapse Heading
      Left
    • Select Heading Below or Expand Heading
      Right
    • Set side-by-side ratio
    • Split Cell
      Ctrl+Shift+-
    • Undo Cell Operation
      Z
    • Notebook Operations
    • Access Next Kernel History Entry
      Alt+Down
    • Access Previous Kernel History Entry
      Alt+Up
    • Change Kernel…
    • Clear Outputs of All Cells
      Clear all outputs of all cells
    • Close and Shut Down Notebook…
    • Collapse All Headings
      Ctrl+Shift+Left
    • Deselect All Cells
    • Edit Notebook Metadata
    • Enter Command Mode
      Ctrl+M
    • Enter Edit Mode
      Enter
    • Expand All Headings
      Ctrl+Shift+Right
    • Interrupt Kernel
      Interrupt the kernel
    • New Console for Notebook
    • New Notebook
      Create a new notebook
    • Open with Panel in New Browser Tab
    • Preview Notebook with Panel
    • Reconnect to Kernel
    • Render All Markdown Cells
    • Restart Kernel and Clear Outputs of All Cells…
      Restart the kernel and clear all outputs of all cells
    • Restart Kernel and Debug…
      Restart Kernel and Debug…
    • Restart Kernel and Run All Cells…
      Restart the kernel and run all cells
    • Restart Kernel and Run up to Selected Cell…
    • Restart Kernel…
      Restart the kernel
    • Run All Above Selected Cell
    • Run All Cells
      Run all cells
    • Run Selected Cell and All Below
    • Save and Export Notebook: Asciidoc
    • Save and Export Notebook: Executable Script
    • Save and Export Notebook: HTML
    • Save and Export Notebook: LaTeX
    • Save and Export Notebook: Markdown
    • Save and Export Notebook: PDF
    • Save and Export Notebook: Qtpdf
    • Save and Export Notebook: Qtpng
    • Save and Export Notebook: ReStructured Text
    • Save and Export Notebook: Reveal.js Slides
    • Save and Export Notebook: Webpdf
    • Select All Cells
      Ctrl+A
    • Show Line Numbers
    • Toggle Collapse Notebook Heading
    • Trust Notebook
    • Other
    • Open in JupyterLab
      JupyterLab
    • Plugin Manager
    • Advanced Plugin Manager
    • Terminal
    • Decrease Terminal Font Size
    • Increase Terminal Font Size
    • New Terminal
      Start a new terminal session
    • Refresh Terminal
      Refresh the current terminal session
    • Use Terminal Theme: Dark
      Set the terminal theme
    • Use Terminal Theme: Inherit
      Set the terminal theme
    • Use Terminal Theme: Light
      Set the terminal theme
    • Text Editor
    • Decrease Font Size
    • Increase Font Size
    • New Markdown File
      Create a new markdown file
    • New Python File
      Create a new Python file
    • New Text File
      Create a new text file
    • Spaces: 1
    • Spaces: 2
    • Spaces: 4
    • Spaces: 4
    • Spaces: 8
    • Theme
    • Decrease Code Font Size
    • Decrease Content Font Size
    • Decrease UI Font Size
    • Increase Code Font Size
    • Increase Content Font Size
    • Increase UI Font Size
    • Set Preferred Dark Theme: JupyterLab Dark
    • Set Preferred Dark Theme: JupyterLab Dark High Contrast
    • Set Preferred Dark Theme: JupyterLab Light
    • Set Preferred Light Theme: JupyterLab Dark
    • Set Preferred Light Theme: JupyterLab Dark High Contrast
    • Set Preferred Light Theme: JupyterLab Light
    • Synchronize Styling Theme with System Settings
    • Theme Scrollbars
    • Use Theme: JupyterLab Dark
    • Use Theme: JupyterLab Dark High Contrast
    • Use Theme: JupyterLab Light
    • View
    • File Browser
    • Open JupyterLab
    • Show Anaconda Assistant
      Show Show Anaconda Assistant in the right sidebar
    • Show Header
    • Show Notebook Tools
      Show Show Notebook Tools in the right sidebar
    • Show Table of Contents
      Show Show Table of Contents in the left sidebar